Introducing linguistic constraints into statistical language modeling

نویسنده

  • Petra Geutner
چکیده

Building robust stochastic language models is a major issue in speech recognition systems. Conventional word-based n-gram models do not capture any linguistic constraints inherent in speech. In this paper the notion of function and content words (open/closed word classes) is used to provide linguistic knowledge that can be incorporated into language models. Function words are articles, prepositions, personal pronouns { content words are nouns, verbs, adjectives and adverbs. Based on this class de nition resulting in function and content word markers, a new language model is de ned. A combination of the word-based model with this new model will be introduced. The combined model shows modest improvements both in perplexity results and recognition performance.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

The Impact of Linguistics on Conceptual Models: Consistency and Understandability

This paper describes a vision in which linguistic knowledge and theories are introduced into conceptual modeling, and it sums up the advantages achieved by this approach. We will show how the extension of conceptual modeling techniques with linguistic theories increases their expressive power, the capability to formalize well-known conceptual aspects, like object roles and constraints, and thei...

متن کامل

Towards a unified framework for sub-lexical and supra-lexical linguistic modeling

Conversational interfaces have received much attention as a promising natural communication channel between humans and computers. A typical conversational interface consists of three major systems: speech understanding, dialog management and spoken language generation. In such a conversational interface, speech recognition as the front-end of speech understanding remains to be one of the fundam...

متن کامل

Statistical and Linguistic Clustering for Language Modeling in ASR

In this work several sets of categories obtained by a statistical clustering algorithm, as well as a linguistic set, were used to design category-based language models. The language models proposed were evaluated, as usual, in terms of perplexity of the text corpus. Then they were integrated into an ASR system and also evaluated in terms of system performance. It can be seen that category-based...

متن کامل

Towards Automatic Identification Of Singing Language In Popular Music Recordings

The automatic analysis of singing from music is an important and challenging issue within the research target of content-based retrieval of music information. As part of this research target, this study presents a first attempt to automatically identify the language sung in a music recording. It is assumed that each language has its own set of constraints that specify which of the basic linguis...

متن کامل

A New Framework for Sign Language Recognition based on 3D Handshape Identification and Linguistic Modeling

Current approaches to sign recognition by computer generally have at least some of the following limitations: they rely on laboratory conditions for sign production, are limited to a small vocabulary, rely on 2D modeling (and therefore cannot deal with occlusions and off-plane rotations), and/or achieve limited success. Here we propose a new framework that (1) provides a new tracking method les...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 1996